Introducing crucial protein panel of gastric adenocarcinoma disease.

AIM
Since interactome analysis of diseases can provide candidate biomarker panel related to the diseases, in this research, protein-protein interaction (PPI) network analysis is used to introduce the involved crucial proteins in Gastric adenocarcinoma (GA).


BACKGROUND
Gastric adenocarcinoma (GA) is the most common type of stomach cancer. There is no efficient diagnostic molecular method for GA.


METHOD
Applying Cytoscape software 3.4.0 and String Database, the PPI network was constructed for 200 genes. Based on centrality parameters, the critical nodes were screened. Gene ontology of the key proteins for pathway analysis and molecular function processing were done and the highlighted pathways and activities were discussed.


RESULTS
Among 200 initial genes, 141 genes were included in a main connected network. Seven crucial proteins, including tumor protein p53, epidermal growth factor receptor, albumin, v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian), v-akt murine thymoma viral oncogene homolog 1, v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) and catenin (cadherin-associated protein), beta 1, 88kDa, and Myogenic differentiation 1, were introduced as key nodes of the network. These identified proteins are mostly involved in pathways and activities related to cancer.


CONCLUSION
In conclusion, the finding is corresponding to the significant roles of these introduced proteins in GA disease. This protein panel may be a useful probe in the management of GA.

Introduction the certain proteins depend on the reciprocal affinity between them (12). The several important topological indices for a network are centrality parameters. Degree, betweenness and closeness are three well known centrally parameters that are used frequently for PPI network analysis. The numbers of edges that connect directly to a node are known as degree (K) and a node with high degree value is called a hub node. The betweenness centrality of a node (for example node n) is calculated in the following steps: fist, all possible paired nodes in the network (except the node n) are determined. Second, the ratio of number of shortest paths between a paired nodes that pass through node n relative to the number of all shortest paths between this paired nodes are determined. Third, the summation of all calculated ratios that its value (BC) is; 0≤BC≤1, and therefore called betweenness of node n. Two nodes of the network may be connected by multiple pathways; the path includes a minimum number of edges is called distance or shortest path (11). A node with high value of betweenness is called a bottleneck node (13). The node with high amounts of degree and also betweenness values is known as hub-bottleneck node (14). Closeness the other centrality parameter is defined as; inverse of the average value of the length of the shortest paths that pass through a node. As like as betweenness, the amounts of closeness centrality (CC) are in the range of 0-1 (11).
There are numerous genes that their regulation depends on the incidence and advances of a disease (15)(16)(17). This relationship is discovered via classical research or highthroughput investigation (18)(19)(20). Therefore "Which one of them is a critical involved gene in the disease?" is a challenging question in medicine. One important screening method in this case is PPI network analysis (21). The genes rank based on their topological properties in the interactome unit. Therefore, an analysis of the vast range of the genes leads to a reduced and restricted suggested biomarker panel (22,23). Gene ontology can be used to determine the involved molecular functions, biological processes, cellular components and biological pathways of the analyzed proteins (24). In this study, 200 related genes to gastric adenocarcinoma were provided from string database, corresponded PPI network constructed by Cytoscape software and the network was analyzed topologically.

Material and Methods
Different sources are available for providing related proteins to diseases. One of the important sources is Cytoscape 3.4. This common software is free and is compatible with different sources. Cytoscape and its applications are powerful tools to provide useful data and information for the mapping PPI network. One of the well-known interaction sources is a String Database (SD) (http://string-db.org/) (25,26). Access to SD is possible through Cytoscape software. Three options of SD are protein, PubMed and disease queries. In this paper 200 related genes to gastric adenocarcinoma are retrieved from a disease query of SD. The corresponded PPI network was constructed and topological parameters were determined. The used cut off for interaction evidence was set at 0.5. Topological analysis provided information about degree, betweenness and closeness centralities. The disease score that shows the relation between the disease and the obtained protein was determined. The nodes with a high value of the degree (connections) are known as hub nodes. The 20 top nodes based on degree values are selected as hub genes. Betweenness centrality (BC) of a node refers to the amount of its exerted control on the other nodes. The nodes with high value of betweenness are called bottlenecks (27). These elements are crucial for the disease onset and progress (28). The hub nodes with high betweenness value are considered as hub-bottleneck nodes (27). In this research the cutoff for degree and betweenness are 60 and 0.03 respectively. Gene ontology analysis of the crucial nodes was done by the application of ClueGO. The ontology analysis was based on pathway analysis and molecular function (MF). The pathways that include at least 4 genes and the genes were at least 4% attributed in the pathway are selected as the relevant pathways. The pathways are grouped and the group was nominated by the name of the pathway that include most number of the genes. The terms that include at least 2 genes and the genes were at least 3% attributed in the term are selected as the involved MF. The MFs are grouped and the group was nominating by the name of the MF that include most number of the genes. In each cluster, similar enrichments were included (29).

Results
The PPI network for gastric adenocarcinoma was constructed by 200 nodes from String databank. The network includes 57 isolated nodes, one paired nodes and a connected component of 141 edges. This component includes 141 nodes and 1508 edges (see figure 1). For better resolution 20 top nodes based on degree values are selected and the other nodes were deleted from the network (the nodes and the related edges are represented in figure 2). The name of the 20 nodes and their centrality parameters (degree, betweetness and closeness) and also disease scores are presented in table 1. To reduce the number of 20 hub nodes and to achieve to crucial genes, the nodes with degree less than 60 and betweenness under 0.03 were deleted. Seven key proteins were selected (hub-bottleneck proteins) and their characteristic parameters are shown in table 2. This panel including, tumor protein p53, epidermal growth factor receptor, albumin, v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/ glioblastoma derived oncogene homolog (avian), v-akt murine thymoma viral oncogene homolog 1, v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) and catenin (cadherin-associated protein), beta 1, 88kDa, and Myogenic differentiation 1. The nodes of the connected component (exception these seven nodes) were deleted and the reminded nodes were shown in figure 3. This subnetwork includes seven nodes and 21 edges. Since pathway analysis is a useful tool to determine the role of an individual protein, the pathway analysis of seven key proteins were done and the findings are represented in the figures 4 and 5. Molecular function analysis can provide useful information about the role of the studied proteins (24). In figures 6 and 7 the results of molecular analysis for the introduced seven hubbottleneck proteins are shown. The minimum percentage of attribution of the genes in the term was 4%. At least presence of four genes in term was regarded. The P value of maximum amount was less than 0.001. The terms with similar color are grouped in an individual group.

Discussion
Protein interaction mapping as a molecular and screening probe, attracted scientist's attention and as a powerful analytical method is applied in a medical investigation (30). In network medicine topological, features of a specific disease are assessed for introducing the crucial involved genes or proteins in the disease. The information which introduce several essential proteins in terms of interactions that can be key proteins in disease onset and progression (31). These proteins can be considered as diagnostic or therapeutic biomarkers that by validation tests may be introduce for clinical approaches. The aim of this study was      nodes in table  1, (with the similar priority) and CTNNB1 as the 9 th node were remained after the screening (see table 2). The CC is also approximately follows the similar trend comparing with K and BC. It seems that these seven proteins are important in the network integrity. In figure 3, the main connected component (presented in figure 1) shows that all nodes are omitted except these seven nodes. All nodes are connected to the six neigbor nodes directly. There are 21 edges in this subnetwork. As discussed, these nodes are densely connected. Since, for each disease, there are specific related pathways, it seems that pathway analysis for these seven proteins can provide essential evidences that confirm the crucial roles of these protein panel in GA. The involved pathways of the seven critical proteins are shown in figure 4. There are 17 pathways in three clusters that at least four proteins among seven introduced proteins are included in each pathway. Six proteins are attributed to the Rac1/Pak1/P38/MMp-2 pathway. The regulatory role of this pathway in angiogenesis in ovary cancer is well known (32). There are five proteins in the five pathways that mostly belong to various cancers. Minimum and maximum values of percentage attribution of these proteins in pathways are 4 and 14%, respectively. Based  on figures 4 and 5, approximately all determined pathways are related to cancer. One of the biochemical features of diseases is regulatory changes of many enzyme or protein activities (33,34). This alteration is related to expression changes of many involved genes. Molecular function analysis for a certain protein set is a useful tool to reveal the importance role of that protein in incidence and advances of the diseases (24). As shown in figures 6 and 7, molecular function analysis shows the crucial proteins are involved in 10 biochemical functions that categorized in eight clusters. Five proteins are related to the each phosphatase binding and protein phosphatase binding activities. Two proteins participate in the other activities. The most rate of attribution (33%) happened in nitric-oxide synthesis regulatory activity.
There are three isoforms of nitric oxide synthases family, which are involved in cancer. This activity is reported in tumor cells of several histogenetic origins and is detected together the important aspects of cancer grade and development. The high level of this activity is associated with inhibition of tumor progression and reduced activity have is accompanied with tumor growth promotion (35).
In conclusion, there is a closed possible biomarker panel related to the gastric cancer. The pathway analysis and molecular function assessment are corresponding to the crucial role of these highlighted proteins. Investigation in the field can be a useful validation method for feasible application of the findings.